Hi! I’m a data scientist in industry, by way of education research. I am deeply interested in sociology, the data science industry (and making it better), R programming, and education. If you’re in need of a speaker on one of these topics, let me know!

Also, I am committed to helping other women get into the data science field, so if you need advice or support, contact me!



I code in R, SQL, and Python, using RStudio and SublimeText as my preferred IDE/code editors, or Jupyter notebooks for some Python projects. I also enjoy writing LaTeX with Sweave and R, to make snazzy print reports.

To see more about what kinds of data science skills I have, check out my projects on this page, or my Github profile.

To manage code and tasks, I use Git and Airflow. I also have experience with Elasticsearch databases and R package development.

This site is built in RMarkdown and a bit of Javascript.




Projects

I enjoy doing data projects in my spare time, and while you can find most of them on kaggle or github, here are direct links to some of my favorites.


Kiva Loan Data Analysis

This is a work in progress, but I’m planning to complete several components:

  • GIS analysis of loans by country with attention to economic conditions in countries (done)
  • Drilldown on some of the thematic areas of the loans (See Agriculture)
  • Data munging of the regional data provided
  • Exploring modeling potential- if repayment time can be predicted, or anomaly detection if labeling outcomes is not possible

Come back to check out the latest as I continue working!




Fun with Real Estate Data

This project is a kaggle kernel, in which I walked the reader through the process of cleaning and modeling the data from a real estate prices dataset, using linear modeling, random forests, and gradient boosting (xgboost). My most popular kernel to date! This one also produced respectable competition results, and was chosen for special recognition by the Kaggle admins. (I won a mug!)

Update: Read the interview I did regarding this project (and the other fabulous winners)! http://blog.kaggle.com/2017/03/29/predicting-house-prices-playground-competition-winning-kernels

Key Skills: machine learning, data cleaning




Data for Democracy 2017 Hackathon

I led a team working on the Chicago Lobbying project, which produced some great output, including this visualization of lobbying and aldermen in Chicago. The project is continuing and building out new functionality. I personally cleaned some of the data underlying, but my biggest contribution was organizing, planning, and leadership. Additional results: https://data.world/lilianhj/chicago-lobbyists

Update: Check out a case study by the fine folks at data.world discussing the work that went in to this project: https://medium.com/@sharonbrener/dbf30aeee70b




Exploring Austin, Texas Crime

Among the public datasets available on Kaggle is this one, describing the crimes that have occurred in Austin, TX over a couple of years. This project cleans the data, does some exploratory analysis, and maps various kinds of crime by district

Key Skills: data cleaning, GIS




Interruptions at the First Presidential Debate

My first natural language processing/text mining! This was a lot of fun, because I watched the debate and then was able to examine how well my actual perceptions matched what the data told me.

Key Skills: NLP, data cleaning




Manufacturers and the Drugs They Make

This project is part of my work for Data for Democracy, a great loose association of data scientists working on projects for public benefit in their free time. This was my first Shiny app, and it’s always getting a few tweaks and improvements when I have time.

Key Skills: web coding, data visualization, Shiny




Medical No-Shows in a Brazilian Hospital

What features of patients help providers predict who is at risk of not showing up to appointments? This one provides insights that could be used by the actual hospital that is the source of the data that can be used to improve their patient care.

Key Skills: data cleaning, modeling, data visualization, machine learning




Dental Care in the ACA Marketplace

I think this is a good kernel, but it never got traction because the data was not glamorous and the results were not very cheerful. In short, the dental coverage from the ACA is seriously inadequate for population needs, unfortunately.

Key Skills: data cleaning, GIS




Tidy Text Mining in Facebook Posts

In this project, I used a provided dataset of facebook posts from a community group and analyzed a few details about the content- specifically, how sentiment and gender related to “likes” on the posts.

Key Skills: NLP, data cleaning, data visualization






s.kirmer@gmail.com | kaggle.com/skirmer

See what I’m reading on Pocket: http://getpocket.com/@data_stephanie





Events and Fun Stuff

Applications are now open for a great one day workshop I am co-organizing with Angela Li! This will be the very first USA-based R Forwards women’s #rstats package workshop, and it’s going to be awesome. Don’t miss it!


Unfortunately the Domino Data Science Pop-Up in Chicago on December 6 was canceled, but I hope to be presenting this material in the spring at another conference - watch this space! This talk will be about integrating Elasticsearch storage with R or Python analytics workflows.


If you came to see me speak about R packages for team collaboration, you can get the slides and supporting materials on Github. If you have questions or need help producing your own packages, hit me up on twitter.